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The Role of Eigenvalues in Linear Feature 
Selection Theory 

D. R. Brown and M. J. O'Malley 
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Introd uction . Recent statistical work in feature selection for the multivariate 
normal pattern recognition problem has concentrated on linearly transforming 
pattern classes so that the transformed pattern classes are equivalently distin- 
guishable. Since, in general, this is not possible, techniques have been 
developed to preserve the distinction of the transformed pattern classes using 
various measures of distinction. These measures of pattern class distinction 
are most often treated as eigenvalue problems ([1], [2], [5], [6], [7], [91, 
[13], (141, [15]). In this paper we consider a particular measure of pattern 
class distinction called the average interclass divergence, or more simply, 
divergence, {[1], 12J, [4], [6], 17], [8], [9], [10], [11]), where divergence 
will be the pairwise average of the expected interclass divergence derived from 
Hajek's two-class divergence as defined, for example, in [9i. 


This work was supported in part by NASA under Contract vlSC-NAS-15000. 
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It has ijeon sliowri ’In 14I that there always exirts a k x n real matrix 
li such that the transformation determined by B maximizes divergence in 
k-dimensional space, and, in fact, that B can be written in the form 
(It,|2)'l, where U is an orthogonal n x n matrix. We will investigate the 
role of the eigenvalues of U in such problems, and give an example demon- 
strating that the divergence measure of pattern class distinction does not 
depend on these eigenvalues (Theorem 7). 

Our example is derived from the family of examples constructed in [31. 

This special class of examples permits analytical calculation of divergence, 
a task ordinarily eschev/ed as unrealistic, and yields a precise expression 
for divergence. The reader is cautioned, however, not to confuse the numerical 
simplicity of this example with impractical ity, since, mathematically, the 
failure of the eigenvalues of U to affect divergence in the restricted case 
erases any hope that they might be meaningful in an arbitrary case, however 
applied. 


1. Special divergence formulas . Let and be the 

covariance matrices and means for m classes, where for each i = 1,.,.,m, 
is an n X n positive definite matrix and is a column n vector. 

Let 

>■ ' ''i - "j • 

j?*i 

Then, assuming equal a priori probabilities, the average interclass divergence 
for these m classes is given by 
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! 


D = S^-) - *5 m(m - l)n (1) 

while, if B is a k x n matrix, the B*average interclass divergence is 

Djj = h tr( (Ba^. B^)'^(BS. B^)) - h m(m - l)k ( 2 ) 

where tr represents the trace function. 

Moreover, as observed in 131, if 

(: = {B _ BB^ = and (B^B)n. - n.(B^B), i = l,...,m} . 


v/here is the k x k identity matrix and is the set of all k x n 

real matrices, then, for any B , (2) may be rewritten as 


Dg = h tr(B(^?^ JlT^ S^)B^} - ‘5 m(m - l)k 


(3) 


For the remainder of the paper we assume that each is a diagonal 

I 


matrix of the form: 


, where x^ is a positive real number. 


‘n -1 


m _i 

r* ' 


and = p. for all i,j. Under these restrictions, is a 


diagonal matrix of the form 


P^n- 


, where 


n- 1 , 


X = .2, (.2. X.) and p = m(m - 1). It follows from (1) that the 

1 ~ 1 X j j - 1 j 

j?'i 

average interclass divergence for the m classes is given by 


0 = M(x - p) 


(4) 


As observed in the introduction, in seeking to maximize the 
interclass divergence Dg, it suffices to consider those k x n 


B-average 
matrices of 
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the form (I|^|Z)U , where U is an n x n orthogonal matrix. In the sequel, 

when considering D^, we shall always assume that B is of this form. For 

any such k x n matrix B, it is obvious that , and hence B e 

if and only if (B^B)ft. = £2^(B^B) for i = l,...,m. Wo will derive necessary 

)0 

and sufficient conditions in order that B c {Theorem 2), but first we 
calculate Dq in the case that formula (3) is valid. Recall that all means 
are hereafter considered equal and all covariance matrices diagonal of the 
form stated above. 


Th^ore^ 1. Let B - (I.lZ)iJ , wliere U = (u..) is an n x n orthogonal 
matrix, and suppose Og is given as in (3) above. Then 


“d = <i5i “ii)'’ 


(5) 


Proo f; Since tr(XY) = tr(YX) whenever both products are defined, v/e have 

T m _i 

in this case Dg = h tr(B B{^.f.-| S^.)) - h pk . If U is written in 


block form, U = 


i: ;) ■ 


where A is k x k , then 
T, ,T, 


. uT(,jzro,|2)u - . Since j, «:'s, = ( = 


X 
P 

P'i I 
\ n-lJ 


M 

-pl I 


n-k^ 


, where M is the k x k matrix 


X 

P 


^k-r 


then B 


B(.?, S.) = . Therefore, tr(8’''B(.'L S.)) 

^ \c'am c‘cj 


p(tr(A'AM) + tr(C C)) = P((.jC^ ^jl^p ^ q^=’2 ^j = l ^^jq^ ^ q=^+l ^j^'l ^jq^^ 


H 2 


k 2 


U^^)x + p(^ 22 (j§i ^ o-.-thogonal , ^^2 = 
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k 2 i ^2 

{1 ' Uj-j) = k - jg-j iiji , so Hint 
^j=1 2 ^ ^j = l “jV*^ • 


'i((j|^ Uj^)x + p(k 


Jl ^Jl)) 


- -2 pk 


Our next result gives necessary and sufficient conditions in order that 
B = (I^|Z)U . While the proof is rather tedious, these conditions are 

particularly easy to apply and hence useful in seeking examples. 

Theorem 2. Let B = (I. |Z)U , where U = (u..) is an n n orthogonal matrix 

i> 1 J 

If, for each i = 1 . . . , ,m, fi. - ( , 1 , then: 

' V ^-1 / 

(1) if " 1 for all i, then D c ; 

,o 

(2) if X. f 1 for at least one i, then Bel- if and only if 

.jli u^ = 1 or .L w^, = 0. 

J“1 jl T=1 Jl 

Proof : If X. = 1 , then n. = and {8^B)fi. = P..{B^B) for any k x n 

matrix B. Thus, if x^ = 1 for all i, then B e ^ for any k x n matrix 

of the form (I. jz)U. We suppose that x. f 1 for at least one i. As in the 

^ / A c\ 

proof of Theorem 1, we decompose U into the block form Ig p| > 

A^A A^c\ 

that B'B = \ ^T^ j.T^ I , where A is again k x k. For a fixed i such 


that x^. 1 , write in block form 


. I , where G. is the 
^n-k/ 

a’^'ag, a’^c 


k X k matrix 


k-i; 


Then (B^B)p. = ( t ^ t ) » while 

^ VC AG. C'C 


P.(B^B) = . Thus, B^B commutes with if and only if 

’ I^cTa cTc I 


REPRODUCIBJL'ITY OP THE 
OIUGINAL PAGE IS POOR 
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(1) A^AG^. = G^.A^A and (2) C^AG. = C^A . We write A^A and C^A in block 
form: A^A = w ) ’ “ (r S ) ’ ** ^ 1x1. 

T T T I ^^i \ 

Since A A is symmetric, N = M . Therefore, A AG. = t , 

^ Vm'x^ W/ 


and G^.A A 


x.L x.m\ t j 

^ ^ . Thus A A6^ = G^A A if and only if H = x^M 


M 


w y 


and similarly, C^AG^ = C^A if and only if Px^ = P and Rx^ = R. Since 


H 


(.|l u.iu.^,..., .|i u-iu ) and U 


' j=l “jk+l“jV 
j=l “jn^jl j 


it 


follows that Mx^ = M, Px. = P, and Rx^. = R if and only if 
k k 

x.(.2, u.,u. = .Z, u.,u, for q = 2,...,n. Thus, since x. y 1, we have 

1 0=1 0l jq 0=1 ol oq ' 1 

^ T k 

that (b'B)^. = fi^(B B) if and only if jS.j ^ I'o*' *1 * 2,...,n. 

Since the above argument is valid for any n. for which x^ 1 , and since 

B B commutes with fl. for any i for which x^ = 1 , it follows that 
^ k 

B e u if and only if .Z, u..u. = 0 for q = 2,...,n. We next show that 

r ■' 0-1 01 oq 

k k 2 ^ 2 

= 0 for q = 2,...,n if and only if jZ.| Uj-j 1 or Uj.j = 0. 

since U is orthogonal. Ji ' jl, + jLl 

q = while 1 = .% , Thos, if .£, = 1, 

then Uj, = 0 for J = k + 1,...,n, and i, u.,u^.^ = j|, 0 ^, 0 ^^ = 0 for 
q = 2,...,n. If . 0 • then Uj, = 0 for J = l,...,k and, 

obviously -t, u..u. = 0 for q ~ 2,...,n. 

0=1 ol oq 
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Conversely, suppose thnt )l Ui-iU. ■* 0 for q - P,...,n. If 

I J • J ^ u M 

= ... = U|^1 = 0, then = 0 and the proof is complete. Otherwise, 

let u_, be the first non-zero element in the first column of U, where 
r . k. Thon 0 = .J, . so that 

"rq° ^ V+ll" 


.Ujji = 0. 


then = 0 for q = 2 n and it follows that 1 = u^^ “ j=l ^jl ' 


rq 


Suppose ^ 0 where r < w r. H . Since + ^^2 u^,qUj,^ = 0 


then 


substituting for q 2, we have 


"rl'Vl " qSz %q<ii^ j4!tl " &> j44l >'jl(qi2 %|"jq> ' " 


-1 




VI 


rl 


Since U is orthogonal, then for j / w, 


n n 2 2 

W, ” rt— O in ** "i "* ^1 |1 

q=Z v/q jq q=2 wq wl 


and for 

It follows that .4^, 


k 2 

%l^j=f+l * %1 ’ substituting in (6), we have 

^wl^^rl ^ ^ ° ’ Multiplying by u^^ , we have 


VI 


“wl<“?i ji +1 “jl - '> = “wl'ji “jl - ') = 0 . Slate u„, f 0 . it now 

k 2 ^2 

follows that 1 = u7i - u., . 

J=r jl j=l jl 


We note that, if there exists at least one il. which is not the identity 

J 

matrix , then the proof of Theorem 2 shows that commutes with all 

fi.'s if and only if commutes with fl. . Moreover, in this case, the 

elements of ^ arc precisely those B = (IJZ)U for which the first column of 


0 


U is of UiG form 


'1 

or 

\ 

J 


\hi 

1 


1 

' / 


\“n, / 


Honce, by Theorem 1, if B e , then Dq " b or Dg = 0 . (Note that 
if for all i, then D = 0 .) 

Vie close this section with a definition. If V denotes the set of all 

n k 9 

n X n orthogonal jiiatrices, let “ (^^j) c: V : 1 or 0}* 

Thus, if there exists f , then B = (!|^|Z)U r if and only if 

u i • 


2. Eigenvalues of U . Let U = (u..) bo an n x n orthogonal matrix. 

' ^ 1 j 

As is well known, [12] , the eigenvalues of U lie on the unit 
circle in the complex plane and non-real eigenvalues occur in conjugate 
pairs. Thus, if U has a real eigenvalue x, then X = i.1 > and, if 
U=a + bi, b/0 is an eigenvalue of U, then "jl = a - bi is also an eigen- 
value of U . Clearly, det U = + 1 . Moreover, if 1 has multiplicity p as 
an eigenvalue of U, -1 multiplicity m, and {a^ + t>ji,aj - bji}?_^ (b^ ^ 0) 

are the remaining eigenvalues of U, then U is similar to a block diagonal 
orthogonal matrix PUP'^ of the form: 



0 


wluM'L‘ 1 appoijrs on tho diagon.'i] p times, -1 appears in- times, and eacii 

is a 2x2 o-thogonal matrix with cigen'/alues a. + b.i , 

J J 

a. - b.i. Furthermore, the order in which the A.'s, I's, and -Ts appear 
J J J 

on the diagonal can be ch.anged to any desired order by a similarity transformation. 
Thus, any two orthogonal n x n matrices with the same set of eigenvalues are 
similar. Finally, v/e observe that if U is a 2x2 orthogonal matrix, then 

U" or U = ( ^ where c^ + d^ = 1 . 

\d -c| \-d c.i 

Let B = (I|^|Z)U r. . For the remainder of the paper we will be concerned 

with determining what role, if any, the eigenvalues of U play in determining 

Dy . If is a sot of n not necessarily distinct complex numbers 

for which there exists an n x n orthogonal matrix U with eigenvalues 

. then we will say that is a (*} se^ . We note that 

if T = {X^,,..,X^} is a set of n not necessarily distinct complex numbers 

such that T is closed under conjugation and every element of T has modulus 1, 

then T is a (*) set . Throughout the following, we assume that 1 5 k < n, 

where k and n are positive integers, and we assume that at least one 

covariance matrix Q. / I . 

1 n 

Proposition 3 . Let {X.j,,..,X^} be a (*) set. Then there exists an orthogonal 

matrix U with eigenvalues X.j,...,Xj^ such that B = ( I|^|Z}U e ^ and Dg = D 

if and only if one of the following conditions holds: 

(i) X^ is real for some i . 

(ii) k 2 and no X^. is real . 


/a. b. 
I J J 
\-b. a. 
'' J J 
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Proof ; Observe that if at least one is real, say X, , then by (7) 
there exists a block diagonal orthogonal matrix U of the form U = ( 1 

where C is an (n - 1) x (n - 1) block diagonal orthogonal matrix with 


eigenvalues X„,,,.,X . Thus, if U = (u..) , then -Si u^, = u% = Xi = 1, 

d n ' j-1 ji ji 


.2 = ..2 . ,2 ^ 


so that B = (I|^(Z)U e ^ and Og = 0 (Theorem 2). If no X^ is real, then 
n is even, and by (7) there exists a block diagonal orthogonal matrix U with 

A, 


eigenvalues X^,...,X^ such that U = 


'1 


, where each is 

u 


A., 


a 2x2 matrix of the form 


aj b 

■'j “j 


2 


b. 0 . Thus, the first 

vl 


column of U Js 


■‘^1 

0 

\i 


and hence, if k 2: 2, then B = (I|^'Z)U e ^ 


and Og = D 


Conversely, suppose that k = 1. If there exists an orthogonal matrix U 
with eigenvalues Xp..,,Xj^ such that B = (Ij^|Z)U e ^ , then U e of . Thus, 


if Dg = D, then U is of the form 


/a 0 ... 0\ 




where a = +1 and 


/ 


C is an (n - 1) x (n - 1) orthogonal matrix. Therefore, a is an eigenvalue 


of U and = a is real for some i. 
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It is natural to consider the nnalcyoits condition Dg = 0. That is, 
given a (*) set does there exist an orthogonal matrix U with 

these eigenvalues such that (1 = (I|,[Z)U c ^ and Dg = 0 ? The answer, as in 
the preceding case, is no in general, but it is true in some important cases. 

Proposition 4 . Let T = bo a (*) set. If either 

(i ) 1 and -I e T , or; 

(ii) i and -i e T , 


then there exists an orthogonal matrix U with eigenvalues such 

that B = d,^|Z)l' t and Dg = 0 . 


Proof . Let and Xg denote the pair 1,-1 or i, -i, let H be any 
(n - 2) X (n - 2) orthogonal matrix with eigenvalues Xg,...,X^ , and let 
0 Z 

I 1 

, where Z denotes an (n - 2) row or column vector 


U = 


H 

Z 


bi\ 

Z 

0 


/ 


of zeros, and if {X^, X 2 > = 0, -1), then b^ = b 2 = 1 t and if 
{Xi , X 2 > = {1, -i) , then b^ = 1 , b 2 = -1 ■ 

Clearly, U is an orthogonal matrix. Moreover, the eigenvalues of U 
are {X-j,...,X^) , since det{xl^ - U) = (x^ - ^1^2^ 

hence the roots of det{xl - U) = 0 are the roots of det(xl , - H) = 0, 

n n-z 

2 

together with the roots of x - b^b 2 = 0 . Since the roots of the former 
equation are the eigenvalues of II, its suffices to show that X^ and Xg 
are the roots of - b-|b 2 = 0. This follows immediately from the relationship 
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defined between the values of X-j and Ag choices of and b2 • 

Thus, since we assume k < n, then Theorem 2 implies that U z ^ , so 

that B = (Ij^|Z)U e , and, by Theorem 1, Og = 0* 

Our next result shows that, if n = 3, then Proposition 4 does not 
characterize those {*) sets T for which there exists an orthogonal matrix 
U with set of eigenvalues T such that B = (Ij^{Z)U and Dg = 0 . We 
will obtain a partial extension of this result to arbitrary n and we will 
make strong use of the extension in our main result, Theorem 7. 

Lemma 5 . Let n = 3, k = 2, and suppose that {X-j , Xg, X^} is a (*) set, 
where X^ = a + bi, X2 = a - bi. 

(1) If Xg = 1 , then there exists a 3x3 orthogonal matrix 

U with eigenvalues X^ , X2, X^ such that U z J. and Og = 0, 

B = (I|^|Z)U, if and only if a, the real part of X^ and X2, 

is less than or equal to zero; 

(2) if Xg = -1 , then there exists a 3x3 orthogonal matrix U 
with eigenvalues X-j , X2, Xg such that U e . J and Dg = 0, 

B = (I|^|Z)U , if and only if a, the real part of X-j and Xg , 
is greater than or equal to zero. 

Proof . Observe that if U r is such that Dg = 0 , where B = (I^jZ)U, 

then by Theorems 1 and 2, U is of the form / 0 ^ 1 ’ where 

\^v n 0/ 

V = +_T and A is a 2x2 orthogonal matrix. Moreover, if U has eigenvalues 


7 
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Xy Xg* X^ , then det(U) = . Thuc, if X^ = 1, then det(U) = 1, 

and if X^ = -1» then det{U) = -1 . We consider the case X 2 == 1 ♦ the 
case Xg = -1 being similar. 

If V = 1, then A is of the form . Then det(xl 2 - U) = 

+ rlx^ - dx - 1 , so that the eigenvalues of U are 1, - (1+d) il i^/ 3-2d-d^ . 

2 

Thus, there exists U with eigenvalues X^ , X 2 > 1 if and only if there exists 

a real number d, |d| s 1 , such that 


2 


£3^2d-d 
2 ‘ 


2 ' 


( 8 ) 


Since |d| s 1 , then ^ ^ 0 , and thus, if U exists, then a 2 0. 

Conversely, if a s 0, then d = -(l+2a) satisfies both equations in (8} 
and |d| r: 1 . If v = -1 , then A = , and the eigenvalues of 


U are 1, 


(d-1) 1 i\! 3+2d-d^ 


An argument similar to the preceding one 


shoves that there exists U with eigenvalues X^ , X^, 1 if and only if a £ 0. 


C orollary 6 . Let n and k be positive integers, 1 £ k < n, and suppose 
that T = {X^,...,X|^} is a {*) set. 

(1) If 1 € T and if there exists a + bi c T, with a £ 0, then there 

exists an n n orthogonal matrix L) with eigenvalues T such 

that U e and Dg = 0, where B = (iJZ)U. 

(2) If -1 e T and if there exists a + bi c T, with a & 0, then 
there exists an n x n orthogonal matrix U with eigenvalues T 
such that U c ^ and = 0, where B = (I^|Z)U . 


1*1 


Proof. By Lemma 5 and its proof, if a j' 0, then A = 



where d = -(1 + 2 n), is an orthogonal matrix with eigenvalues 1 , a + bi. 

Thus, if U is the n x n block diagonal matrix h) ’ where H 

is an (n - ") x (n - 3) orthogonal matrix with eigenvalues T\{1, a + bi) , 
then U is an orthogonal matrix with eigenvalues the elements of T. Therefore, 
if U is the n x n matrix obtained from U by interchanging the third and 
n— rov;s and columns of U , then U is orthogonal, and, since U is similar 
to U , the eigenvalues of U are also the elements of T. Finally, since 


the first column of U is 


0 \ 

0 

\h' 


we have U e 


i, 


and, by Theorems 1 


and 2, Dq = 0 , where B = (I|^|Z)U and k < n . The proof of (2) is 
similar. 


We make a few additional observations before stating our main result. 

Let U be an n x n orthogonal matrix with eigenvalues X.j , (a^ + bji}j _2 > 

where b. may be zc^’O. Since tr(U) is the sum of the eigenvalues of U, 

J 

it follows that if X-j = 1 and > 0 for j = 2 ,...,n , then 

tr(U) = 1 + .£5 a. > +1 , while if X, - -1 and a. < 0 for j = 2,...,n 

J I J 

then tr(U) - -1 + .L a. < -1 . Also, if A is orthogonal and det(A) = -1, 

J ^ J 

then -1 is an eigenvalue of A. This follows immediately from the fact that 
det{A) is the product of the eigenvalues of A , repeated to their respective 
multiplicities. Finally, if A is orthogonal, n x n, and n is even, then 
det(A) = -1 implies that both -1 and 1 are eigenvalues of A. 
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Theorem 7 . Let n and k be positive integers* 1 < k < n, let U be 
an n X n orthogonal matrix, and let B = (Ij^|Z)U be such that Dg = D. 


If U 


^n-1 ^ 

: -1 


U 


and 


if B‘ = (I|^|Z)U , then B = B 


so 


that = Dg = 0, Either U or U is similar to an n x n orthogonal 
matrix e such that Og^ = 0, where B^ = (I|^|Z)U^. 


Proof . Note that the matrix U differs from U only in that the last row of 

U is the negative of the last row of U . Clearly, since k < n, we have 

B = B. 

Now suppose that n is even. If det{U) = -1, then 1 and -1 are 
eigenvalues of U and thus, by Proposition 4, there exists an orthogonal 
matrix similar to U such that = (I|^|Z)U^ c ^ and Dg^ = 0 . If 
det(U) = 1, then det(U) = -1, and the above argument applied to U yields 
the same conclusion. 

Suppose that n is odd. Then U must have at least one real eigenvalue, 
X . If X •* 1 and if U has another eigenvalue a + bi , a s 0, then the 
conclusion follows from (!) of Corollary 6. Similarly, if X = -1 and if U 
has another eigenvalue a + bi , a a 0 , then the conclusion follows from (2) 

of Corollary 6. Suppose now that X = 1 is an eigenvalue of U and that 

a > 0 for all other eigenvalues a bi of U. Then det(U) == 1 and 
tr(U) > 1. Since det{U) = -1 , it fellows that -1 is an eigenvalue of U, 
and, since tr(U) can differ from tr(U) by at most 2, we have that 
tr(U) > -1 . Thus, U must have an eigenvalue of the form c + di , where 
c > 0, and hence, hy (2) of Corollary 6, there exists an orthogonal matrix 
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, simi’ar to U , such that ^ ° 

case in wnich X = -1 is an eigenvalue of U and that a < 0 for all other 
eigenvalues a + bi of U is handled in a similar manner, and we omit the 
proof. 


3. Conclusion . This paper provides an example to show that, even under 

extremely strong conditions, the eigenvalues of U do not affe t the value 

of divergence in the space of reduced dimension, 

k 
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